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The  computation  of  robust  M-estimates  of  regression  is  considered 
in  detail  using  the  <p  functions  of  Huber,  Andrews,  and  Hampel.  The 
computation  of  M-estimates  of  regression  is  considered  for  linear  models, 
linear  models  with  vector  observations,  and  nonlinear  models.  Examples 
are  given  using  actual  data  for  each  of  these  different  classes  of  models. 
Careful  attention  is  given  to  the  important  problem  of  convergence  of 
M-estimates  with  redescending  ip  functions.  A lengthy  treatment  of  this 
problem  is  given  for  the  Daniel  and  Wood  data  by  considering  several 
starting  methods  for  the  iterative  solution  and  different  breakpoints 
for  the  i i>  functions. 
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1.  INTRODUCTION 


The  estimation  of  coefficients  in  a linear  regression  model  by 
least  squares  has  long  been  plagued  by  the  possible  presence  of 
outliers,  i.e.,  observations  which  for  some  reason  do  not  belong  with 
the  major  portion  of  the  observations  or  with  the  regression  model. 

To  quote  Huber  [1],  "even  a single  grossly  outlying  observation  may 
spoil  the  least  squares  estimate  and  moreover  outliers  are  much  harder 
to  spot  in  the  regression  case  than  in  the  simple  location  case." 

Several  robust  alternatives  to  the  use  of  least  squares  in  estimating 
the  coefficients  in  a linear  regression  model  have  been  developed  which 
are  outlier  resistant.  Robust  statistical  methods  may  be  loosely 
described  as  those  which  will  perform  well  under  a variety  of  underlying 
distributions  or  in  the  presence  of  observations  from  contaminating 
distributions. 

Robust  estimation  methods  have  been  classified  by  Huber  [1]  and  [2]. 
Huber's  classifications  are  termed  L-estimates,  M-estlmates,  and  R-estimates. 
The  L-estimates  are  formed  as  linear  combinations  of  the  order  statistics. 

The  a - trimmed  mean  is  an  example  of  an  L-estimate  for  a location 
parameter.  The  R-estimates  are  derived  on  the  basis  of  rank  tests.  The 
estimate  of  location  obtained  by  taking  the  median  of  all  pairwise 
averages  of  the  observations  is  an  R-estimate.  Probably,  the  most 
popular  robust  regression  methods  are  the  M-estimates.  Their  popularity 
stems  from  their  generality,  their  close  computational  relationship  to 
least  squares,  and  the  ease  of  numerical  computation. 
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2.  M-Estimates  for  Regression 
Given  the  linear  model 


y<  V,  Vj  * ei  i‘,>n 


(i) 


where  the  regression  parameters  e,  are  unknown  and  to  be  estimated 

J 

from  a knowledge  of  the  values  y.  and  x. ..  The  M-estimates  of 

I I J 

e.  minimize 

*3 


n 

r P 
i=l 


' z xii0i 
j=l  1J  J 


(2) 


where  p(-)  is  some  suitable  function.  Differentiating  (2) 
leads  to 


n T 

E X.  y(y  - x.e)  = 0 
i=l  1 1 1 


(3) 
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where  xj  = col  (x^,  xi2,---,  xin)  and  ’*'(*)  is  the  deri’vat^ve  of 
p ( * ) - (3)  is  the  analog  of  the  normal  equations  in  least  squares 

regression.  The  estimate  which  results  from  solving  (3)  is  called  an 
M-estimate.  Rather  than  specifying  the  function  p,  M-estimates  are 
usually  described  by  specifying  the  function  y.  If  f(y;e)  is  the 
probability  density  function  underlying  the  observations  and  if  we 

choose  y = — /f(y;e),  then  the  M-estimate  obtained  is  the 

maximum  likelihood  estimate.  Since  the  function  p is  usually  not 
homogeneous,  as  it  would  be  in  least  squares,  the  M-estimates  obtained 
would  usually  not  be  scale  invariant.  To  force  scale  invariance  we 
instead  minimize 


!p 


where  s is  some  measure  of  dispersion  of  the  residuals, 

y.  - x.  .0 ..  The  quantity  s also  needs  to  be  robust. 

1 j=l  1J  J 

Several  v functions  have  been  proposed  in  the  literature.  The 
original  v function  proposed  by  Huber  limits  the  sensitivity  of 
the  estimator  to  gross  errors  in  the  data.  This v function  is  given  by 


'y(x)  = 


a sgn  (x)  |x|>  a 


HUBER  - * 

A 'y  function  of  a different  type  which  Is  an  example  of  the  redescending 
type  of  v function  and  which  provides  rejection  of  gross  errors  as  well 
as  limited  error  sensitivity  Is  the  function  proposed  by  Hampel  [3]. 


|x|<a 
a<  |x |<b 
b<|x|<c 
|x|>c 
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Y functions  have  also  been  proposed  by  Tukey  [5]  and  Ramsay  [6] 
Tukey's  'V  function  is  given  by 

*('-j2)2  |X|<* 

’(X>=1  0 |x|>a 

Another  f function  which  was  proposed  by  Ramsay  is 
H'(x)  = xe 
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The  Ramsay  v is  of  the  redescending  variety  but  the  descent  is  very 
slow  in  comparison  with  other  redescending  v functions.  Two  f 
functions  which  are  special  cases  of  the  Hampel  t which  we  have 
found  to  be  useful  are  for  a=b=c  and  b=c.  We  will  call  these 
and  H_,  respectively.  They  are  given  by 


k 


Estimates  with  bounded  ¥ function  tend  to  be  robust.  If  the  ¥ function  also 
returns  to  zero  the  estimator  will  tend  to  reject  the  more  gross  outliers  and 
will  be  robust  for  a larger  proportion  of  outliers.  However,  the  p functions 
corresponding  to  the  ¥ functions  of  the  redescending  class  are  not  convex. 
Therefore,  the  numerical  solution  for  M-estimates  using  a redescending  ¥ 
function  may  result  in  an  estimate  which  does  not  correspond  to  a global 
minimum  of  (4).  This  convergence  to  the  wrong  estimate  may  result  in  a 
degraded  robust  estimate.  We  shall  exhibit:  this  type  of  behavior  for  a 
Hampel  ¥ in  Section  8,  which  deals  with  the  Daniel  and  Wood  data. 


r 


As  an  example  of  the  ability  of  M-estimates  to  detect  outliers 
consider  the  data  set  below  which  is  a time  sequence  of  real 
angular  measurement  data  and  contains  some  gross  outliers  which 
are  obvious  by  inspection.  A quadratic  curve  is  fit  to  the  data  for 
the  purpose  of  determining  the  outliers. 


Observations 

Residuals  from 

Least  Squares  Fit 

Residuals  from 
Robust  Fit 

-1.70987 

-.157774 

.000012 

-1.70942 

-.000204 

.000004 

-1.70893 

.105480 

.000003 

-1.70845 

.159227 

-.000015 

-1.70793 

.161087 

-.000010 

-1.70741 

.111021 

-.000021 

-1.70682 

.009099 

.000022 

-1.70626 

-.144780 

.000019 

-1.70571 

-.350595 

-.000010 

-1.70510 

-.608277 

.000005 

-1.70449 

-.917885 

.000004 

1.43777 

1.862231 

3.141637 

1 . 44602 

1.456410 

3.149243 

-1.70257 

-2.158177 

-.000007 

1.44667 

.473139 

3.146558 
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The  residuals  from  the  ordinary  least  squares  fit  do  not  yield 
any  information  about  the  outliers  in  the  data  whereas  the  outliers 
among  the  residuals  from  the  robust  M-estimate  are  obvious.  The 
robust  M-estimate  for  this  example  used  a Hampel  y with  breakpoints 
3,  6,  9. 

Another  example  which  has  residuals  in  all  regions  of  the  Hampel 
^-function  is  the  following  data  set. 


LEAST 

NORMALIZED 

SQUARES 

ROBUST 

ROBUST 

RESIDUALS 

RESIDUALS 

OBSERVATION 

RESIDUALS 

1 

-.011022 

-.000278 

.20642275 

1.005559 

2 

-.009071 

-.000006 

.20973521 

.020803 

3 

-.007471 

-.000033 

.21296912 

.120171 

4 

-.005711 

.000151 

.21663652 

.546808 

5 

-.004461 

-.000123 

.22006619 

.445501 

6 

-.002730 

.000136 

.22425138 

.492246 

7 

-.001590 

-.000144 

.22811853 

.519552 

8 

-.000213 

-.000135 

.23249603 

.487267 

9 

.001201 

-.000038 

.23718297 

.136926 

10 

.002489 

-.000014 

.24201791 

.051 970 

11 

.003798 

.000082 

.24714760 

.297949 

12 

.005624 

.000748 

.25306741 

2.703007 

13 

.005421 

-.000564 

.25723122 

2.037977 

14 

.008660 

.001617 

.26510980 

5.845255 

15 

.006010 

-.002037 

.26737381 

7.361710 

16 

.009663 

.000662 

.27621340 

2.394020 

17 

.011016 

.001113 

. 28302583 

4.023731 

18 

.010359 

-.000392 

.28810282 

1.418292 

19 

.011568 

.000019 

.29531815 

.067036 

20 

.012005 

-.000291 

.30203451 

1 .051051 

21 

.012861 

-.000129 

.30944403 

.464413 

22 

.013557 

-.000075 

.31696650 

.269818 

23 

.014001 

-.000222 

.32450959 

.800901 

24 

.014501 

-.000260 

.33238295 

.938668 

25 

.015039 

-.000209 

.34056693 

.754131 

26 

.015433 

-.000249 

.34888132 

.898506 

27 

.015913 

-.0001 52 

.35755414 

.547835 

28 

.016283 

-.000113 

.36639033 

.406971 

29 

.016494 

-.000181 

.37534057 

.654202 

30 

-.058265 

-.075167 

.30959446 

271.639732 

31 

-.172487 

-.189565 

.20465789 

685.052254 

32 

.018472 

.001270 

.40517605 

4.589248 

33 

-.064416 

-.081690 

.3321 2063 

295.211357 

34 

.089274 

.071980 

.49591643 

260.122231 

35 

-.251831 

-.269092 

.16519139 

972.446930 

36 

.007852 

-.009326 

.43552655 

33.701152 

37 

. 1 59606 

.142564 

.59820610 

515.197899 

38 

.059168 

.042313 

.50896735 

152.912771 

39 

.016704 

.000088 

.47797510 

.318960 

40 

.016296 

-.000028 

.48931307 

.101770 
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The  solution  for  the  M-estimate  used  a least  square  starting 
solution  and  a Hampel  ¥ function  with  breakpoints  at  2.5,  5,  and 
7.5.  In  the  list  of  least  squares  residuals  given  above  some  of 
the  outliers  are  obvious  while  others  are  not.  The  column  of  nor- 
malized residuals  is  merely  the  robust  residual  divided  by  the 
robust  dispersion  measure  s.  If  we  declare  that  residuals  greater 
than  2.5  s are  outliers  then  we  would  flag  observations  12,  14,  15, 

17,  30,  31,  32,  33,  34,  35,  36,  37,  and  38  as  outliers.  Some  of 
these  outliers  are  much  more  gross  than  others.  The  M-estimate  of 
the  parameter  vector  is  e0  = .20388,  el  = .05419,  e2  = .04427. 

This  example  is  simulated  data  so  that  the  true  parameter  vector  is 

known  to  be  e0  = .20388,  6*  = .0537,  e2  = .0445.  The  least  squares 

(0)  (0)  to) 

starting  solution  was  e0  = .21636,  ej  = .01901,  02  = .05466. 

3.  Numerical  Compu  a .ion  of  M-Estimates 

One  of  the  most  attractive  features  of  least  squares  estimation  is 
the  ease  of  numerical  solution.  One  might  be  inclined  to  think  that 
the  numerical  solution  for  M-estimates  would  in  many  cases  be 
prohibitive.  This  is  not  the  case.  At  worst  (4)  can  be  minimized  by 
one  of  the  many  algorithms  for  minimization,  e.g.,  the  Fletcher  - Powell 
[7].  However,  either  a Gauss-Newton  or  a weighted  least  squares  solution 
can  usually  be  applied  to  obtain  the  M-estimate. 
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The  Gauss-Newton  method  can  be  applied  to  the  computation  of 
M-estimates  by  linearization  of  (4)  or  (5)  below.  Setting  the  deri- 
vative of  (4)  with  respect  to  e equal  to  zero 


N T v,  - X.e 
z xT  y(  1 - - 1-  ) = 0 
i=l  1 s 


(5) 


Since  (5)  is  in  general  nonlinear  in  0,  we  must  usually  employ  some  form 

* (kl 

of  iteration  for  solution.  Suppose  we  have  obtained  an  estimate  0 v ' 
in  the  iteration  sequence.  We  will  discuss  methods  for  obtaining  a 

starting  solution  0 ^ in  a later  section.  Linearizing  (5)  about  0^ 


where  r^)  = yi  - X^0  ^ ^ 
solving  for  e^k+1^ 


;(k+l)  = 0(k) 


+ M 


N 

Z 

i*l 


r (k) 

*( T > XI 


(7) 


where 


M = 


N 

z 

i=l 


Ck) 


w • (— 1- 
* ' S 


(8) 
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(7)  and  (8)  are  iterated  until  ||e^+^  - e^||  is  less  than  some 
prescribed  value  or  for  a fixed  number  of  iterations. 

A somewhat  simpler  method  for  solution  is  obtained  by  approxi- 


mation of  the  Gauss-Newton  method.  Replacing  y 
equations  by  its  sample  mean 

N r. (k)  T 

_ z y(-i  )xT 

N e y (-7  ) 


tU,n 


the  above 


where 


M = E X!X. 
i=l  1 1 


The  advantage  of  this  simplified  method  is  that  M and  its  inverse  need 
to  be  calculated  only  once  during  the  iteration  procedure. 

A simple  method  for  the  computation  of  M-estimates  which  has  achieved 
considerable  popularity  is  the  iterative  application  of  weighted  least 
squares.  We  rewrite  (5)  as 


. 


x^i  Xi6  ) = 0 


Now  let 


y,  - X.e 
4«(— ! i- 

V9)  = s 


) 


y.  - X.e 


(12) 


Then  (11)  is 


£ W.(e)xT(yi  - 


i=l 


i ' i 


Xi6)  = 0 


(13) 


(13)  can  be  solved  iteratively  as  follows.  Let  e*k)  be  an  arbitrary  point 
in  the  iteration  sequence.  Then  we  approximate  (13)  by 


I W.(e(k))x{(y.  - X.0(k+1))  = 0 

i=l  11  1 


(14) 


Solving  (14)  for  e^k+1^ 


Thus,  we  can  use  an  ordinary  weighted  least  squares  algorithm 
iteratively  to  obtain  the  M-estimate. 

Throughout  the  discussion  of  M-estimates  we  have  used  the 
dispersion  measure  s of  the  residuals  without  any  consideration 
for  its  computation.  Robust  dispersion  measures  are  often  taken  to 
be  a multiple  of  the  interquartile  range  or  of  some  other  range 
statistic.  A dispersion  measure  which  has  been  popular  with  those 
using  M-estimates  is  the  median  deviation  or  the  MAD  (Median  of  the 
Absolute  Deviations)  estimate  as  it  is  sometimes  called.  The  MAD 
estimate  for  regression  is  defined  by 


s = med|r.. | j .6745 


(16) 


where  r^=yi  - X^.  Hampel  [3]  has  shown  that  the  MAD  estimate  is  the 
most  robust  estimate  of  dispersion.  In  the  iterative  schemes  described 
above  a new  value  of  s is  computed  at  each  stage  of  the  Iteration  using 
the  most  recent  set  of  residuals.  Thus  in  obtaining  an  estimate 


e^k+1 ^ we  use 


s=med|r.(k)|  j. 6745 


(17) 


where  r. 


= y.  - X.e(k), 

J-\  i 
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Testing  of  the  Gauss-Newton  and  the  weighted  least  squares  methods 
for  the  computation  M-estimates  on  the  Daniel  and  Wood  data,  which  is 
presented  in  a later  section  showed  that  the  weighted  least  squares 
method  to  be  far  better  than  Gauss-Newton.  The  Gauss-Newton  had  very 
poor  convergence  properties  for  this  data,  especially  when  using  the 
Andrews  ¥ function. 


4.  Covariance  of  Estimates 

An  approximate  covariance  for  an  M-estimate  can  be  obtained  from 
the  Gauss-Newton  method.  Assuming  the  observation  errors  e.  and  e.  in 

* J 

(1)  to  be  statistically  independent  we  use  (7)  and  (8)  to  obtain  the 
approximate  covariance  for  e. 


cov(e)  E 


-i  N T -i 
M ( z XX. )M 
j=l  J J 


08) 


We  further  approximate  cov(e)  by  replacing  the  expectation  in  (18) 
by  its  sample  mean.  Thus,  we  obtain 


cov(e) 


N 2 y i 

Z ¥ (— 

n-pi=l 


1 


- X.e  - 

tt-Mm 


N t 

( l X X.)M 
j = l J J 


(19) 
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Corresponding  to  the  approximation  used  to  obtain  (9)  and  (10) 
we  can  further  approximate  (19)  by  replacing  f'(-i)  in  M by  its  sample 

mean.  Using  this  in  (19)  gives  an  alternative  approximation  to  the 
covariance 


cov(e)  = 


i N 2 - X.e 

JL  y v ill 

n'P  j=1  * s } 


1 r yi  “ Xi® 

I I y* (-J-  J-) 


j=l 


(20) 


In  [1]  Huber  considers  the  asymptotic  bias  of  the  expressions  (19)  and 

(20).  Huber  also  gives  another  alternative  approximation  to  the  covariance 
for  an  M-estimate. 


5.  Starting  Solutions 


Any  of  the  numerical  methods  used  to  obtain  an  M-estlmate  requires 
a starting  or  preliminary  estimate  of  the  regression  parameters  e. 

The  starting  solution  is  of  primary  importance  and  for  some  cases  will 
determine  whether  or  not  a usable  M-estimate  is  obtained.  Robust 
estimation  using  ¥ functions  of  the  redescending  type  is  especially 
sensitive  to  the  starting  solution  because  the  solution  iteration  may 
converge  to  a local  minimum  which  is  relatively  remote  from  the  global 
minimum,  if  a poor  starting  solution  is  used.  At  best,  poor  starting 
solutions  require  more  iterations  for  convergence.  The  most  obvious 
solution  with  which  to  start  the  M-estimation  iteration  is  the  un- 
weighted least  squares  solution.  However,  since  the  unweighted 

least  squares  solution  is  highly  influenced  by  the  presence  of  outliers, 

-(o) 

it  may  not  provide  a suitable  starting  solution,  e . Nevertheless, 

least  squares  is  often  useful  for  starting.  In  some  cases  where  the  y^ 

are  small  and  the  components  of  e are  also  small  the  starting  solution 
-(o) 

e =0  may  be  useful.  This  is  often  the  case  in  instrument  calibration, 
see  [8], 

A good  starting  solution  should  itself  be  a robust  estimate  of  the 
regression  coefficients.  Although  the  use  of  a robust  starting  solution 
may  greatly  increase  the  computing  time,  it  will  often  be  necessary  if 
the  two  simple  procedures  mentioned  above  fail.  Several  robust  regression 
methods  which  are  suitable  starting  procedures  for  M-estimates  are 
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described  in  [9].  One  of  the  simplest  of  these  methods  is  an  extension 
of  the  method  proposed  by  Theil  [10].  In  applying  this  method  we  include 
a constant  term  separately  from  the  other  terms  in  the  linear  model. 

We  then  apply  a Gram-Schmidt  orthogonal ization  process  to  the  remaining 
independent  variables.  The  computation  of  the  values  X!.  of  the 
orthogonal  variables  is  given  by 


j-i 

XI,  = X,,  - z r ..  XI. 


ij 


ij 


k=l 


jkik 


(22) 


rjk 


N 

Z 

i=l 


XijXik 


(23) 


In  terms  of  the  orthogonal  independent  variables  the  linear  model 
given  by 


y,  = 0. 


P-1 

z 


XI, e: 


ev 


i=l,N 


(24) 
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Estimates  of  the  regression  coefficients  e'  are  obtained  using  our 

J 

modified  method  of  Theil  by  the  following  process. 


y.  - y. 

dm(i,j)  = y-r — j>i  i = l ,N-1 

jm  im 


60 ' = med  d (i , j ) 
m . . m 


3.  e’  0'  + 60 ' 

m m m 


4'  yi  - se;  Xim  i’1’N 


5.  Repeat  steps  1-4  until  convergence. 

6.  eQ  = med  y. 


> m=l  ,p-l 


In  the  above  med  z.  means  to  take  the  median  of  the  variables  z.  over  the 
i i 

index  set  i.  In  order  to  recover  the  original  regression  coefficients, 
it  is  necessary  to  apply  the  Gram-Schmidt  process  to  the  ej. 
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(25) 


1-1 

ep-l-i  0p-l-i  ' Vl-j,p-l-iep-l-j  i=1»P‘2  (26) 

For  even  moderate  values  of  N the  number  of  slopes  dm(l ,j ) which  must 
be  computed  is  quite  large.  Rather  than  use  all  of  these  slopes  we 
can  instead  work  with  a reduced  number  of  slopes.  One  possible  reduced 
set  of  slopes  can  be  obtained  letting  the  x'ijn  be  arranged  in  increasing 
order  for  each  m and  let  N*  = [^].  Thus,  if  N is  odd  xN*m  is  the  median 
of  the  xlm,  i=l,N.  We  then  use  the  slopes 


d (i ) 
m 


y+i : *i 

xN*+i,m"  x'im 


1=1  ,N  (N  even) 

1=1, N*  - 1 (N  odd) 


These  slopes  are  then  used  in  step  2 of  the  iteration  process  with 


60  = med  d (j ) . 

m . m 
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Another  robust  regression  method  for  obtaining  a starting  solution 


for  M-estimates  is  an  application  of  Spearmans  p as  described  in  [9]. 

We  again  form  a set  of  orthogonal  independent  variables  xlm  i=l,N  by 

applying  the  Gram-Schmidt  process  in  (21)  - (23).  Let  Rx  be  the  rank 

im 

of  x'.  among  the  x‘.  j=l»N  and  let  R be  the  rank  of  y.  among  the 
im  jm  y.j  1 

y.,  j=l ,N.  Then  Spearmans  p,  a nonparametric  estimate  of  the  population 
<3 

correlation  coefficient  is  defined  as 


u\  ■ rx>(v v 

i=l  xim  \ y-\  y 


VBx.  -v2 


(27) 


u v v N+l 
where  Rx  = R = -p  . 

m * 


is  just  the  ordinary  defining  equation  for  the  correlation  coefficient  with 
the  variates  replaced  by  ranks.  A more  useful  definition  of  px^  for 
computing  is 


N 

LLV 


N(N2  - 1) 


(28) 
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where  is  the  rank  difference 


d.  = R - R 
1 y1  xim 


In  an  orthogonal  regression  model  the  estimates  of  the  regression 
coefficients  may  be  written  as 


e'  = p 
m 


A A A 

where  p , a , a are  the  usual  sample  correlation  coefficient  and 
xnr  y xm 

standard  deviations.  An  obvious  method  of  obtaining  a robust  estimate 
of  e'  is  to  replace  p w,  a.  a in  (29)  by  nonparametrtc  estimates 

m V y xm 

of  these  quantities.  Thus,  we  replace  px  by  Spearmans  p and  replace 

fir 

a by 

y * 

medjy.  - y | 

; = J (3£ 

y .6745 


where  y*  = med  y..  We  could  also  replace  o by  an  estimate  similar  to 

i 1 „ 2 i N , m-  2 

(30)  but  in  most  cases  a = «rr  t (xim  - xm)  is  sufficient.  The 

m 1*1  , , 

process  is  used  iteratively  to  improve  the  estimate  of  em-  The  procedure 

is  implemented  by  the  following  steps. 
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4. 


Repeat  steps  2-3  until  convergence. 

5.  e = med  y, 
o i i 


As  before  we  must  apply  the  Gram-Schmidt  process  to  the  8m  in  order 
to  recover  the  original  regression  coefficients. 

A third  method  for  obtaining  a robust  starting  solution  is  the 

orthogonal  Brown-Mood  method.  This  is  a variation  of  the  Brown-Mood 

method  [11]  which  uses  orthogonal  independent  variables.  Let 

*'  m=l ,p-l ,i=l,N  be  a set  of  orthogonal  independent  variables 
Aim’  r * 

obtained  by  applying  the  Gram-Schmidt  process.  Let  ^ be  the  median 

~(k) 

of  the  x'.  i=l,N.  The  Brown-Mood  method  is  iterative  so  let  em 

im  (k) 

be  some  estimate  in  the  iteration  sequence  and  let  r.  be  the 

residual s 


p-1 

l 

m=l 


x*  ®m(k) 
im  m 


(31) 
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.00 

The  Brown-Mood  method  computes  corrections  60^  to  em  by 


r 00+  . r 00" 
ri  i 

60'  = . 

m x*  - x 

m m 

(32) 

where 

x!  = med  x! 

, T im 

ieIu 

*U  = <l|xim  - V 

(33) 

x"  = med  x‘. 
m ieIL  im 

iL  = hi>;,<v 

(34) 

r.(k)+  = med  r.(k) 

1 ieIu 

(35) 

r.(k^  = med  r.(k) 

1 iElL 

(36) 

A (k+1 ) . (k) 

The  estimates  are  updated  by  0^  ^ em  + 69m  and  the  above 

procedure  is  iterated  to  convergence.  Finally,  the  estimate  of  0Q 


is  obtained  from 


The  orthogonal  Brown-Mood  method  is  implemented  by  the  following  steps 
(starting  with  e ^ = 0) 


1 ' "m  = m?d  x; 


im 


2.  x = med  x'. 


iel 


im 


U 


x’  = med  x] 


m 


iel, 


im 


3. 


med  y. 


iel 


U 


yT  = med  y. 
iel 


4. 


L 

i ^ 1 


+ 

, - y 

60'  = + 

£ - >C 

m m 


e1  e'  + 60' 
m mm 


> m=l ,p-l 


5. 


y,  - 60'  xl  i=l ,N 
i m im 


6.  Repeat  steps  3-5  until  convergence 

7.  0 = med  y. 

o i i 


J 
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6.  Robust  Regression  with  Vector  Observations 

The  problem  of  linear  regression  with  observations  of  more  than  one 
dependent  variables  is  quite  common.  In  this  case  we  are  given  N obser- 
vations of  each  dependent  variable  ya>  a=l,m.  We  denote  these  observations 
by  ya(i),  i=l»N  , a=l»m-  The  vector  of  parameters  to  be  estimated  is  still 
denoted  by  e.  The  observations  are  related  to  the  parameter  vector  by  the 
linear  model 

y(i)  = A(i)e  + e(i ) i=l,N  (37) 

where  A(i)  is  an  mxp  matrix,  y(i)  is  an  m-vector  of  observations  and  e(i)  is 
an  m-vector  of  measurement  errors.  A least  squares  estimate  of  e would 
minimize 

l (y( i)  - A(i)e)T  (y(i)  - A(1)e)  (38) 

i*l 

A robust  alternative  to  the  least  squares  estimate  would  minimize 


N m 

l l Pa 

i=l  o=l  “ 


y„0)  - aCT(l)e 


(39) 


where  a (i)  is  the  ath  row  of  A(i)  and  Pa(*)  may  be  a different  function 
for  each  of  the  dependent  variables,  and  sa  is  a robust  estimate  of 
dispersion  for  the  residual  y^i)  - aa(i)e.  Setting  the  derivative  of 
(39)  with  respect  to  e to  zero  gives 


N m aT(1) 

I I — 

1=1  o=l  Sa 


y„(l>  - «a(l)e 


= 0 


(40) 
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(40)  can  be  conveniently  be  rewritten  as 


AT(i)D_1  { (oT} 


(y(1)  - A(i)e), 


where  D is  the  diagonal  matrix  D=diag  (s^,  s2>- 
is  the  vector  of  functions 


sj  and  $ 


f(x)  = 


’I'i  (x-j ) 

>I»2^x2  ^ 


Either  a Gauss-Newton  or  a weighted  least  squares  solution  can  be  used 
to  iteratively  obtain  the  M-estimate  from  (42).  If  is  an  arbitrary 
point  in  the  iteration  sequence,  the  weighted  least  squares  method  applied 
to  (42)  gives 

-(k+l)  „ M-1  J AT(i)D"1W(i)D"1y(i)  (43) 

i=l 

where  D is  the  diagonal  matrix 
D = diag  (s^  s2>  — , s ) 


M = l AT(i)D"1W(i)D"1A(i) 


W(i)  is  a matrix  of  weights  given  by 


W(i )=diag 


r!k)(i) 


1\  sn 


ri  (i ) 


rg.k)  (1 ) 


rJK;(1) 


r<k,0) 


mV  s„ 


where  r^(i)  is  the  residual 

a 

rfk)(i)  = yJD  - a (i)e(k) 


(46) 


As  an  example  of  robust  linear  regression  with  vector  observations 
consider  the  calibration  of  a laser  tracker.  The  laser  tracker  measures 
the  range,  azimuth  and  elevation  of  M targets  with  known  range,  azimuth, 
and  elevation.  Calibration  constants  for  the  tracker  are  computed  by 
comparing  the  observations  against  the  known  positions  of  the  M targets. 

Let  R . , E . , and  A . be  the  known  surveyed  range,  azimuth,  and  elevation 

5j  5j  SJ 

of  the  jth  target.  Suppose  that  multiple  observations  of  the  targets  are 
available  so  that  we  have  N.  observations  for  the  jth  target.  Denote  these 

J 

range,  azimuth,  and  elevation  observations  by  R. .,  A...  and  E..,  1*1,  N., 

I J I J * J J 

j=l,M.  Let 


- Rsj  ■ rIe  4 r1i 

- \j = aje  4 

- Esj  = cj9  + c1j 


where  e is  an  unknown  parameter  vector,  r^,  a^,  and  e^  are  known  vectors, 
and  r..,  a..,  e . . are  random  error  terms.  A common  model  for  r.,  a., 

J J 

and  e.  is  given  by 

J 


V'8!4  e2  Rsj 


(47) 


aTe  = e,  - e-  tanE,.cosA,.  - ec  tanE^.sinA,.  - ec  cos  E„. 


'sj 


sj 


sj' 


sj 


"sj 


(48) 


eje  = 97  + e4  sinASJ.  - 05  cosA$j 


(49) 
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The  following  illustrates  the  application  of  the  above  to  real  field 

' 


data.  The  laser  tracker  is  calibrated  using  range,  azimuth,  and  elevation 
measurements  from  eight  reflective,  surveyed  targets  arranged  in  a circular 
pattern  around  the  tracker  at  a range  of  about  2500  feet.  We  use  the  model 
in  (47)  - (49).  Since  the  elevations  of  the  eight  targets  are  approximately 

32 
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equal,  it  is  obviously  impossible  to  estimate  8g  in  (48)  without  additional 
observations.  In  order  to  provide  these  extra  observations,  we  observe  the 
same  calibration  targets  but  with  the  tracker  "dumped",  i.e.  with  an  azimuth 
of  approximately  A$1-  + 180°  and  an  elevation  of  approximately  Esi  - 180°. 
These  additional  observations  are  called  dumped  readings  and  are  treated  as 
additional  calibration  targets.  Also,  it  will  not  be  possible  to  estimate 
e2  in  (47)  since  all  ranges  are  approximately  equal.  In  order  to  estimate 
e2,  we  observe  four  additional  targets  with  ranges  varying  from  20,000  feet 
to  60,000  feet.  Robust  estimation  of  e was  done  for  this  example  using  a 
Hampel  4 function  with  breakpoints  a=2.5,  b=5.0,  and  c=7.5.  Approximately 
250  observations  are  available  for  each  target.  The  results  of  this  robust 
calibration  are  summarized  in  the  following  table  by  tabulating  the  nunber 
of  residuals  for  each  target  lying  in  each  region  of  the  Hampel  iJj.  The 
number  of  residuals  in  each  region  is  the  sum  of  the  number  in  the  positive 
and  corresponding  negative  regions  of  the  41  function.  The  first  eight 
target  boards  are  at  2500  ft.  circularly  about  the  tracker.  Targets  9-12 
are  the  long  range  target  boards.  Targets  13-20  are  "dumped"  readings  of 
the  first  eight  targets.  From  the  table  it  is  obvious  that  most  of  the 
observations  from  several  target  boards  are  outliers,  particularly  for 
the  "dumped"  readings.  This  example  has  about  22%  contamination  by  outliers 
which  is  extreme  for  this  application,  but  illustrates  the  power  of  the 
M-estimation  process  in  dealing  with  many  outliers. 
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NUMBER  OF  RESIDUALS 


7.  Nonlinear  Regression 

Instead  of  estimating  regression  parameters  in  the  linear  model 
suppose  we  want  an  M-estimate  of  the  parameter  vector  0 in  the  nonlinear 
mode! 


y.  = f . (0)  + ei , i=l ,N 


(52) 


where  f ^ ( - ) is  a given  nonlinear  function, 
is  obtained  by  minimizing 


Then  an  M-estimate  of  0 


(53) 


Differentiating  (53)  with  respect  to  0 gives  the  nonlinear  equations 


l rt(e)  * 

i=l  1 


= 0 


(54) 


where  F^e)  is  the  derivative  vector 


F.j  (e)  = 


3^(0) 


ae 


0=0 


(55) 


(54)  can  be  solved  by  iteration.  Either  Gauss-Newton  or  weighted  least 
squares  iteration  can  be  used  to  solve  (54).  Suppose  we  use  weighted 
least  squares.  We  rewrite  (54)  as 
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■ 


i 


*(k) 

Let  ev  ' be  an  arbitrary  point  in  the  iteration  sequence.  Linearizing 
(56)  about  ev  ' and  discarding  higher  order  terms  gives 
N 


l W.(e(k))  F{(e(k))  [y.  - F (e(k))(e 
i=l  1 1 l 1 1 


(k+1>-9(k>)\  =0 


(57) 


where 


W.(e^)  = 


Vi  - ^i(e(k)) 


yi  - f1(8(k)) 


(58) 


Solving  (57)  for  9^k+^  ^ 


«(k+D=e(kM  v 


-1 


+ (J!1«J(i(.k,)Fj(i(k,)FJ(5(k,)j  k))yf 


The  choice  of  starting  solution  for  a nonlinear  problem  presents  additional 
difficulty  if  the  unweighted  least  squares  solution  Is  not  suitable.  Methods 
for  obtaining  other  starting  solutions  are  dependent  on  the  nature  of  the 
problem. 

As  an  example  of  the  application  of  M-estimates  with  a nonlinear  model 
consider  the  N-station  cinetheodolite  trajectory  data  reduction  problem. 

In  this  situation  we  are  given  azimuth  observations  aa(tj)  and  elevation 
observations  ea(t^),  a=l,N.  at  each  time  point  t^  along  a trajectory. 

From  these  Ni  clnetheodolites  (tracking  cameras)  we  must,  estimate  the 
cartesian  position  x(t^),  y(t^),  z(t^)  at  each  time  point.  The  observations 
are  aot(ti)=Aa0*-j)+  error  and  ea(t^  )=Ea(x'1 )+  error,  where  x,j  Is  the  3-vector 
Wt,)  y(tj)  z(tj)].  The  measurement  functions  A^(xj)  and  E^Cxj)  are  given 
by 


1 
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1/2 


A (x.)  = tan"1 

a i 


x<*1>  - *g 


Ea(x1)  = tan"1 


z(t1)  ~ za 

C(x(t.)  - xa)2+(y(ti)  - ya)2] 


where  (xa,  ya,  zj  is  the  cartesian  position  of  the  ath  camera.  In  this 
case  we  have  a nonlinear  regression  problem  with  vector  observations.  In 
this  application  we  minimize 


As  a numerical  example  of  this  application  consider  the  following  situation 
which  is  rather  extreme  but  sometimes  occurs.  A missile  is  fired  at  a drone 
and  cinetheodolites  are  observing  both  the  missile  and  the  drone.  It  is 
required  to  estimate  trajectories  for  both  the  missile  and  drone.  Due  to  an 
inadvertent  clerical  error,  one  of  the  cameras  which  was  actually  observing 
the  missile  was  erroneously  listed  as  observing  the  drone.  Obviously,  when 
doing  a least  squares  solution  to  obtain  the  drone  trajectory,  the  azimuths 
and  elevations  from  one  camera  will  be  gross  outliers  and  will  destroy  the 
least  squares  solution  for  the  drone  position  coordinates.  A single  point 
of  this  situation  is  given  by  the  data  below. 

Camera  Obs.  Azimuth  Obs.  Elevation 


1 

.568106 

.338886 

2 

-.626010 

.122620 

3 

-2.665036 

.359168 

4 

1 . 926249 

.327177 
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Camera  2 is  the  one  which  is  actually  tracking  the  missile  rather  than  the 

drone.  Obviously,  as  in  most  situations  which  are  the  nonlinear,  there  Is 

no  way  of  distinguishing  the  outliers  by  Inspecting  the  observations.  As 

always  in  robust  estimation  a preliminary  solution  Is  required  to  start  the 

iteration.  Let  (xa,  ya,  z^)  be  a position  solution  obtained  from  the  ath 

pair  of  cameras.  In  this  example  we  have  six  possible  pairs  of  cameras  so  that 

a=l,6.  We  then  start  the  iteration  with  (x°,  y°,  z°)  where  x°=  med  x , a=l,6, 

a 

y°=  med  ya>  a=l,6,  z°=  med  za,  a=l,6.  For  the  example,  the  median  guess 
solution  is  x°=  -45147.9  ft.,  y°=  87423.8  ft.,  z°=  11117.3  ft.  After  five 
iterations  the  sequence  has  converged  to  the  solution  x=  32964.8  ft., 
y=  87425.2  ft.,  z=  11114.9  ft.  The  residuals  from  the  final  solution  are 


CAMERA 

RESIDUALS 

AZIMUTH 

ELEVATION 

1 

.000008 

-.000064 

2 

-.242553 

.011513 

3 

.000022 

.000081 

4 

.000057 

-.000019 

Thus,  the  robust  solution  using  the  Hampel  with  breakpoints  of  3,  6,  9, 
correctly  identified  the  outliers.  Let  us  carry  this  example  farther. 
Suppose  we  have  no  observations  from  camera  1,  i.e.,  we  have  data  from  only 
three  cameras  one  of  which  Is  bad.  In  this  case  our  starting  solution  turns 
out  to  be  x°=  45147.9,  y°=  87424.1,  z°=  11120.2.  After  four  Iterations  the 
solution  has  converged  to  x=  -32966,  y=  87424.6,  z=  11115.3.  Thus,  we  are 


* 


I 

I 


again  able  to  correctly  identify  the  bad  camera.  Now  suppose  we  have 
data  from  cameras  1,  2,  3.  In  this,  the  initial  guess  solution  is 
x°=  45147.9,  y°=  67033.9,  z°=  11118.9.  After  ten  Iterations  the  solution 
is  x=  -35023.9,  y=  84462.1,  z=  11004.1.  The  solution  eventually  converges 
to  the  correct  value,  but  slowly.  A third  possibility  to  have  data  from 
only  three  cameras  is  observations  from  cameras  1,  2,  4.  In  this  case 
the  guess  solution  is  x°=  -46454.3,  y°=  87548.3,  z°=  7262.7.  After 
three  iterations  the  solution  has  converged  to  x=  -35392.6,  y=  86464.3, 
z=  1044.8.  Thus,  in  this  case  the  iteration  has  converged  to  the  wrong 
solution.  In  the  last  two  cases  where  the  solution  converged  very  slowly 
and  converged  to  the  wrong  solution,  the  starting  solution  was  too  far 
from  the  correct  solution.  If  a sufficiently  good  start  had  been  provided, 
the  solution  would  have  converged  correctly  in  a few  iterations.  If  the 
number  of  cameras  were  great  enough  in  comparison  to  the  number  of  bad 
cameras,  using  the  median  of  the  solutions  obtained  from  the  camera  pairs 
provides  an  acceptable  starting  solution.  Unfortunately,  the  number  of 
cameras  is  often  no  more  than  three  or  four.  In  the  case  of  three  cameras 
the  use  of  a starting  solution  predicted  from  preceding  points  might  be  a 
desirable  procedure. 

8.  EXAMPLE  - The  Daniel  & Wood  Data 

The  Daniel  and  Wood  data  has  been  used  by  several  authors  [4],  [12], 

[13]  to  illustrate  robust  regression  methods.  The  data  is  taken  from 
Daniel  and  Wood  [4],  Chapter  5,  where  it  is  examined  in  considerable  detail. 
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The  Daniel  and  Wood  data  is  a sequence  of  21  observations  in  3 independent 
variables  given  below 


Obs  # 

y 

X1 

x2 

x3 

1 

42 

80 

27 

89 

2 

37 

80 

27 

88 

3 

37 

75 

25 

90 

4 

28 

62 

24 

87 

5 

18 

62 

22 

87 

6 

18 

62 

23 

87 

7 

19 

62 

24 

93 

8 

20 

62 

24 

93 

9 

15 

58 

23 

87 

10 

14 

58 

18 

80 

11 

14 

58 

18 

89 

12 

13 

58 

17 

88 

13 

11 

58 

18 

82 

14 

12 

58 

19 

93 

15 

8 

50 

18 

89 

16 

7 

50 

18 

86 

17 

8 

50 

19 

72 

18 

8 

50 

19 

79 

19 

9 

50 

20 

80 

20 

15 

56 

20 

82 

21 

15 

70 

20 

91 

40 
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The  linear  model  assumed  for  this  example  Is 

*i  = 90+  elxli  + e2x2i  + e3x3i  + ei  i=1’21 
The  Daniel  and  Wood  data  is  treated  here  first  by  ordinary  least  squares 

and  then  by  M-estimates  using  Huber,  Hampel,  and  Andrews  ^-functions  com- 
bined with  different  possible  starting  solutions  for  these  M-estimates. 

We  denote  the  M-estimation  process  with  a Huber  ip  function  having  a 
breakpoint  at  x=a  by  Hu(a),  and  with  a Hampel  ip  function  having  breakpoints 
of  a,  b,  c by  Ha(a,  b,  c),  and  with  an  Andrews  ijj  function  with  parameter  a 
by  An(a).  When  starting  these  M-estimation  processes  with  the  ordinary 
least  squares  solution,  we  obtain  the  following  sets  of  regression  parameter 
estimates. 


90 

el 

e2 

e3 

OLS 

-39.92 

.7156 

1.295 

-.1521 

Hu(1.4) 

-41.06 

.8249 

.9466 

-.1291 

Ha(1.4,2.8,4.2) 

a 

-42.88 

.9233 

.6736 

-.1079 

V’-4> 

-42.41 

.9257 

.6617 

-.1120 

residuals  from 

these  solutions 

are 

OBS 

JL  OLS 

Hu(1.4) 

H (1.4, 2. 8,4. 2) 

a 

And.4) 

1 3.23 

3.01 

2.43 

2.46 

2 -1.91 

-2.12 

-2.67 

-2.65 

3 4.56 

4.16 

3.50 

3.52 

4 5.70 

6.44 

6.86 

6.88 

5 -1.71 

-1.67 

-1.80 

-1.79 
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OBS 


# 

OLS 

HU(1.4) 

H fl.4,2.8,4.2) 

a 

An(1,4) 

6 

-3.01 

-2.61 

"2.47 

-2.45 

7 

-2.39 

-1.79 

-1.50 

-1.44 

8 

-1.39 

- .79 

- .50 

- .44 

9 

-3.14 

-2.31 

-1.78 

-1.75 

10 

1.27 

.51 

- .16 

- .23 

11 

2.64 

1.68 

.81 

.78 

12 

2.78 

1.49 

.37 

.33 

13 

-1.42 

-2.23 

-2.95 

-3.00 

14 

- .05 

- .75 

-1.43 

-1.43 

15 

2.36 

2.28 

2.19 

2.19 

16 

.90 

.89 

.87 

.85 

17 

-1.59 

- .87 

- .31 

- .38 

18 

- .46 

.04 

.44 

.40 

19 

o 

40 

1 

CNJ 

CM 

• 

.88 

.85 

20 

1.41 

1.53 

1.55 

1.52 

21 

-7.24 

-8.86 

-10.40 

-10.43 

In  the  above  sets  of  residuals  there  are  no  grossly  outlying  observations 
so  that  we  cannot  readily  judge  the  four  regression  methods.  The  robust 
methods  have  somewhat  smaller  residuals  than  the  OLS  method  and  possibly 
the  regression  with  the  Hampel  or  Andrews  ^-function  gives  slightly 
smaller  residuals  than  regression  with  the  Huber  ^-function.  The  non- 
parametric  measure  of  dispersion  for  the  residuals  In  each  of  the 
regressions  Is 
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OLS 

Hu(1.4) 

H (1.4,2. 8,4. 2) 

d 

A„(1.4) 

2.83 

2.49 

2.30 

2.25 

If  the  residuals  were  tested  for  outliers  against  3s,  the  OLS  regression 
does  not  indicate  any  outliers,  but  the  Huber,  Hampel,  and  Andrews  re- 
gressions indicate  that  the  21st  observation  is  an  outlier.  In  addition, 
the  Andrews  regression  shows  the  4th  observation  to  be  an  outlier.  Both 
the  Hampel  and  Andrews  regressions  show  the  21st  observation  as  a gross 
outlier  by  giving  it  a zero  weight. 

Daniel  and  Wood,  after  some  exhaustive  analysis,  declare  that  obser- 
vations 1,  3,  4,  and  21  are  outliers.  Also,  in  reading  about  the 
experiment  from  which  the  data  were  gathered,  it  is  discovered  that 
observations  1,  3,  4,  and  21  were  taken  during  transient  conditions  of  the 
plant  whereas  the  other  observations  were  taken  during  steady  state  con- 
ditions. Thus,  on  the  basis  of  Daniel  and  Woods  work  and  the  observations 
by  the  original  experimenters  observations  1,  3,  4,  and  21  are  probably 
outliers.  The  regression  solution  without  these  four  points  is  9q  = -37.65, 
e1  = .7977,  ©2  = .5773,  = -.0671.  The  failure  of  the  robust  regressions 

to  detect  all  of  the  outliers  can  be  traced,  at  least  in  the  case  of  the 
Hampel  and  Andrews  regressions,  to  the  inadequate  least  squares  starting 
solution.  We  will  demonstrate  in  the  following  that  with  a sufficiently 
good  starting  solution  the  Hampel  and  Andrews  regressions  will  converge 
to  solutions  for  which  the  outliers  are  obvious.  Suppose  we  try  the 
orthogonal  Theil  method,  the  Spearmans  p method  and  the  orthogonal 
Brown-Mood  methods  previously  described  to  start  the  M-estimate  regressions. 
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From  these  starting  methods  we  obtain  the  following  regression  coeffi- 
cients which  will  be  used  to  start  the  M-estlmates. 


k 


90 

91 

02 

e3 

Spearman  p 

-43.25 

.7578 

.8100 

-.0257 

Theil 

-40.93 

.7761 

.6928 

-.0384 

Brown- Mood 

-39.21 

.7981 

.3846 

-.0000 

OLS  (w/o  1,3,4,21) 

-37.65 

.7977 

.5773 

-.0671 

Both  the  H (1.4, 2. 8, 4. 2)  and  the  A (1.4)  regressions  converge  to  the  same 
solution  as  before  when  using  the  Spearman  p starting  solution.  Also,  the 
An ( 1 . 5 ) converges  to  the  same  solution  as  before  when  using  the  Theil 
starting  solutions.  The  An (1.4)  converges  to  a solution  for  which  the 
outliers  are  obvious  when  using  the  OLS  (w/o  1,3,4,21)  or  Brown-Mood  starting 
solutions.  Also,  the  H_ (1 . 4,2.8,4. 2)  regression  converges  to  a solution 

a 

for  which  the  outliers  are  obvious  when  using  either  the  Brown-Mood,  the 
Theil  or  the  OLS  (w/o  1,3,4,21)  starts.  The  regression  coefficients 
obtained  are 

90  91  e2  e3 

An(1.4)  from  OLS  (w/o  -37.85  .8239  .5494  -.0751 

1,3,4,21)  and  Brown- 
Mood 

H (1 .4, 2. 8,4. 2)  from  -37.39  .8113  .5548  -.0734 

a 

Brown-Mood,  OLS  (w/o 
1,3,4,21)  and  Theil 

The  residuals  from  these  solutions  are 
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An(1.4)  from  OLS  (w/o  Ha(l .4, 2. 8, 4. 2)  from  Brown-Mood 

OBS  # 1,3,4,21)  and  Brown-Mood  and  OLS  (w/o  1,3,4,21) 


1 

5.78 

6.04 

2 

.71 

.97 

3 

6.08 

6.28 

4 

8.11 

8.16 

5 

- .79 

- .73 

6 

-1.34 

-1.28 

7 

- .44 

- .40 

8 

.56 

.60 

9 

-1.04 

-1.04 

10 

.18 

.22 

11 

.85 

.88 

12 

.33 

.37 

13 

-2.67 

-2.63 

14 

-1.40 

-1.38 

15 

1.44 

1.38 

16 

.22 

.15 

17 

- .38 

- .43 

18 

.14 

.09 

19 

.67 

.60 

20 

1.88 

1.88 

21 

-8.98 

-8.81 

The  four  outliers  have  now  become  fairly  obvious  among  the  residuals. 
Both  of  the  regressions  give  zero  weight  to  these  observations. 

The  dispersion  measure  for  the  residuals  in  Andrews  regression  is  1.26 
and  in  the  Hampel  regression  is  1.44. 
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The  convergence  of  the  A (1.4)  and  H (1 .4, 2. 8, 3. 2)  regressions  on  the 
Daniel  and  Wood  data  to  a solution  close  to  the  OLS  (w/o  1,3,4,21)  regres- 
sion In  which  the  outliers  are  obvious  is  very  sensitive  to  the  starting 
solution.  The  sensitivity  of  the  robust  regressions  to  the  starting 
solution  for  the  Daniel  and  Wood  data  can  be  greatly  lessened  by  changing 

the  breakpoints  of  the  ^-functions  so  that  we  are  doing  Afl)  and  H (1,2,3) 

n o 

regressions.  Both  the  AM)  and  H_ (1 ,2,3)  regressions  converge  to  the 

n a 

same  solutions  starting  from  OLS,  Spearman  p,  Theil,  and  Brown-Mood  starting 
solutions.  The  regression  coefficients  obtained  are 


eo 

01 

e2 

AnM) 

-37.11 

.8190 

.5175 

Ha(l,  2,  3) 

-37.01 

.8183 

.5202 

The  residuals  from  these 

OBS  # 

solutions  are 
An(D 

Ha(l,  2,  3) 

1 

6.09 

6.11 

2 

1.02 

1.04 

3 

6.30 

6.32 

4 

8.24 

8.25 

5 

- .72 

- .71 

6 

-1.24 

-1.23 

7 

- .32 

- .30 

8 

.68 

.70 

9 

- .96 

- .96 

10 

.12 

.13 
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r 


OBS  # 

An  ( 1 ) 

Ha0,  2, 

11 

.77 

.79 

12 

.21 

.24 

13 

-2.74 

-2.73 

14 

-1.46 

-1.43 

15 

1.32 

1.34 

16 

.10 

.12 

17 

- .43 

- .44 

18 

.08 

.08 

19 

.63 

.63 

20 

1.86 

1.87 

21 

-8.95 

-8.92 

1 
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